\section{Deployment}
Since Sparkling Water is designed as a regular Spark application, its deployment cycle is strictly driven by Spark deployment strategies (refer to Spark documentation\footnote{Spark deployment guide \url{http://spark.apache.org/docs/latest/cluster-overview.html}}). Spark applications are deployed by the \texttt{spark-submit}~\footnote{Submitting Spark applications \url{http://spark.apache.org/docs/latest/submitting-applications.html}} script that handles all deployment scenarios:

\begin{lstlisting}[style=Bash]
./bin/spark-submit \
--class <main-class>
--master <master-url> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
\end{lstlisting}

\begin{itemize}
	\item \texttt{--class}: Name of main class with \texttt{main} method to be executed. For example, the \texttt{water.SparklingWaterDriver} application launches H2O services.
	\item \texttt{--master}: Location of Spark cluster
	\item \texttt{--conf}: Specifies any configuration property using the format \texttt{key=value}
	\item \texttt{application-jar}: Jar file with all classes and dependencies required for application execution
	\item \texttt{application-arguments}: Arguments passed to the main method of the class via the \texttt{--class} option
\end{itemize}


Sparkling Water supports deployments to the following Spark cluster types:
\begin{itemize}
	\item{Local cluster}
	\item{Standalone cluster} 
	\item{YARN cluster}
\end{itemize}

\subsection{Local cluster}
The local cluster is identified by the following master URLs - \texttt{local}, \texttt{local[K]}, or \texttt{local[*]}. In this case, the cluster is composed of a single JVM and is created during application submission.

For example, the following command will run the ChicagoCrimeApp application inside a single JVM with a heap size of 5g:
\begin{lstlisting}[style=Bash]
$SPARK_HOME/bin/spark-submit \ 
  --conf spark.executor.memory=5g \
  --conf spark.driver.memory=5g \
  --master local[*] \
  --class org.apache.spark.examples.h2o.ChicagoCrimeApp \
  sparkling-water-assembly-1.5.1-all.jar  
\end{lstlisting}


\subsection{On Standalone Cluster}
For AWS deployments or local private clusters, the standalone cluster deployment\footnote{Refer to Spark documentation~\url{http://spark.apache.org/docs/latest/spark-standalone.html}} is typical. Additionally, a Spark standalone cluster is also provided by Hadoop distributions like CDH or HDP. The cluster is identified by the URL \texttt{spark://IP:PORT}.

The following command deploys the ChicagoCrimeApp on a standalone cluster where the master node is exposed on IP mr-0xd10-precise1.0xdata.loc and port 7077:

\begin{lstlisting}[style=Bash]
$SPARK_HOME/bin/spark-submit \ 
  --conf spark.executor.memory=5g \
  --conf spark.driver.memory=5g \
  --master spark://mr-0xd10-precise1.0xdata.loc:7077 \
  --class org.apache.spark.examples.h2o.ChicagoCrimeApp \
  sparkling-water-assembly-1.5.1-all.jar  
\end{lstlisting}

In this case, the standalone Spark cluster must be configured to provide the requested 5g of memory per executor node. 

\subsection{On YARN Cluster}
Because it provides effective resource management and control, most production environments use YARN for cluster deployment.\footnote{See Spark documentation~\url{http://spark.apache.org/docs/latest/running-on-yarn.html}} 
In this case, the environment must contain the shell variable~\texttt{HADOOP\_CONF\_DIR} or \texttt{YARN\_CONF\_DIR}.

\begin{lstlisting}[style=Bash]
$SPARK_HOME/bin/spark-submit \ 
  --conf spark.executor.memory=5g \
  --conf spark.driver.memory=5g \
  --num-executors 5 \
  --master yarn-client \
  --class org.apache.spark.examples.h2o.ChicagoCrimeApp \
sparkling-water-assembly-1.5.1-all.jar  
\end{lstlisting}

The command in the example above creates a YARN job and requests 5 nodes, each with 5G of memory. The \texttt{yarn-client} option forces driver to run in the client process.
